A semi-supervised regression model for mixed numerical and categorical variables

نویسندگان

  • Michael K. Ng
  • Elaine Y. Chan
  • Mee Chi So
  • Wai-Ki Ching
چکیده

In this paper, we develop a semi-supervised regression algorithm to analyze data sets which contain both categorical and numerical attributes. This algorithm partitions the data sets into several clusters and at the same time fits a multivariate regression model to each cluster. This framework allows one to incorporate both multivariate regression models for numerical variables (supervised learning methods) and k-modes clustering algorithms for categorical variables (unsupervised learning methods). The estimates of regression models and k-modes parameters can be obtained simultaneously by minimizing a function which is the weighted sum of the least squares errors in the multivariate regression models and the dissimilarity measures among the categorical variables. Both synthetic and real data sets are presented to demonstrate the effectiveness of the proposed method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Semi-supervised Learning Framework to Cluster Mixed Data Types

We propose a semi-supervised framework to handle diverse data formats or data with mixedtype attributes. Our preliminary results in clustering data with mixed numerical and categorical attributes show that the proposed semi-supervised framework gives better clustering results in the categorical domain. Thus the seeds obtained from clustering the numerical domain give an additional knowledge to ...

متن کامل

Semi-supervised Learning for Mixed-Type Data via Formal Concept Analysis

• We propose a semi-supervised learning (SSL) method, called SELF (SEmi-supervised Learning via FCA), using Formal Concept Analysis (FCA) – It can handle mixed-type data containing both discrete and continuous variables ∘ Numerical data are discretized by binary encoding / Summary • We propose a semi-supervised learning (SSL) method, called SELF (SEmi-supervised Learning via FCA), using Form...

متن کامل

Semi-supervised Gaussian Process Ordinal Regression

Ordinal regression problem arises in situations where examples are rated in an ordinal scale. In practice, labeled ordinal data are difficult to obtain while unlabeled ordinal data are available in abundance. Designing a probabilistic semi-supervised classifier to perform ordinal regression is challenging. In this work, we propose a novel approach for semi-supervised ordinal regression using Ga...

متن کامل

Categorical Reparameterization with Gumbel-Softmax

Categorical variables are a natural choice for representing discrete structure in the world. However, stochastic neural networks rarely use categorical latent variables due to the inability to backpropagate through samples. In this work, we present an efficient gradient estimator that replaces the non-differentiable sample from a categorical distribution with a differentiable sample from a nove...

متن کامل

Iclr 2017 C Ategorical R Eparameterization with G Umbel - S Oftmax

Categorical variables are a natural choice for representing discrete structure in the world. However, stochastic neural networks rarely use categorical latent variables due to the inability to backpropagate through samples. In this work, we present an efficient gradient estimator that replaces the non-differentiable sample from a categorical distribution with a differentiable sample from a nove...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Pattern Recognition

دوره 40  شماره 

صفحات  -

تاریخ انتشار 2007